library(topGO)
library(knitr)   # to use kable()
library(limma)   # to use vennDiagram()
library(ggplot2)
TAG <- params$TAG
VAR <- params$VAR
ANNOTATION <- list(genes = '../2019-07-26/genes/annotation.txt',
                   isoforms = '../2019-07-26/transcripts/annotation.txt')
TAGDIR <- paste0('../2020-01-08/', TAG, '/')

Introduction

This is the enrichment analysis for genes regulated by hatching. Because I quantified the association of expression with hatching in two different ways, this document has two sections. Section Variance reports the enrichment analysis performed with genes ordered by the proportion of expression variance explained by hatching. Note that some genes with low variance may still be highly associated with hatching, even if the fold change between levels of this factor is low. Section Differential expression uses an ordering of genes based on the significance of the differential expression between levels of hatching, which does depend on fold change.

Reading the data

Functional annotation is in 2019-07-26. I will also upload two lists of genes, with either proportion of variance explained by hatching or p-value of differential expression test.

tagVariance <- read.table(paste0(TAGDIR, VAR, '_variance.txt'))
tagPValue   <- read.table(paste0(TAGDIR, VAR, '_pvalue.txt'))
annotation  <- read.table(ANNOTATION[[TAG]], col.names = c('tagname', 'goterms'))

To initialize the topGOdata object, I will need the gene list as a named numeric vector, where the names are the genes identifiers and the numeric values, either the portion of variance explained by hatching or the p-values of the differential expression test. The structure() function adds attributes to an object.

Variance <- structure(tagVariance[,1], names = row.names(tagVariance))
PValues  <- structure(tagPValue[,1],   names = row.names(tagPValue))
rm(tagVariance, tagPValue)

There are 18598 genes scored with a variance portion and a differential expression p-value. It should actually be the exact same genes. The annotation data frame has more than one GO term for every tag, separated by |. I need a named list.

head(annotation)
##       tagname                                                           goterms
## 1 XLOC_000002                                  GO:0008417|GO:0016020|GO:0006486
## 2 XLOC_000009                                  GO:0043130|GO:0005515|GO:0043161
## 3 XLOC_000010                       GO:0005840|GO:0015935|GO:0003735|GO:0006412
## 4 XLOC_000015 GO:0006355|GO:0003700|GO:0006357|GO:0043565|GO:0005634|GO:0032502
## 5 XLOC_000021                       GO:0016567|GO:0006397|GO:0061630|GO:0008270
## 6 XLOC_000036                                  GO:0005272|GO:0006814|GO:0016020
allgenes2GO <- strsplit(as.character(annotation$goterms), "|", fixed = TRUE)
names(allgenes2GO) <- annotation$tagname
rm(annotation)

There are 9691 genes with GO annotations. But the differential expression analysis includes many more genes. In order to include the not-annotated genes in the enrichment analysis, to see if annotated or not annotated genes are more or less often differentially expressed, I should attribute a GO term to them. According to [http://geneontology.org/docs/faq/] nowadays we express lack of annotation by annotating to the root nodes, i.e. GO:0008150 biological_process, GO:0003674 molecular_function, and GO:0005575 cellular_component.

for (tag in unique(c(names(PValues), names(Variance)))) {
   if (! tag %in% names(allgenes2GO)) {
      allgenes2GO <- append(allgenes2GO,
         structure(list(c("GO:0008150", "GO:0003674", "GO:0005575")), names = tag))
   }
}

Using differential expression p-values

Building the topGO object

Creation of a topGO dataset is documented in section 4 of topGO’s the user manual: https://bioconductor.org/packages/release/bioc/vignettes/topGO/inst/doc/topGO.pdf. I need to use the new command, and fill up the slots. The annot object must be a function that compiles “a list of GO terms such that each element in the list is a character vector containing all the gene identifiers that are mapped to the respective GO term.” There are several options, that you can check using help(annFUN.gene2GO), for example. The annFUN.gene2GO requires a gene2GO argument, which is the list of gene-to-GO terms I made before. The geneSelectionFun object is a function that selects the interesting (most significant) genes. It is required to perform Fisher’s exact test. The nodeSize is used to prune the GO hierarchy from the terms which have less than n annotated genes.

I generate three datasets, to analyse each of the three ontologies.

selection <- function(allScores) {return(allScores < 0.01)}
GOdata.BP <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'BP', 
   allGenes = PValues,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdata.MF <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'MF',
   allGenes = PValues,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdata.CC <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'CC',
   allGenes = PValues,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
   Num_Genes = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), numGenes),
   Num_GO_terms = sapply(list(GOdata.BP, GOdata.MF, GOdata.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
Number of feasible genes or transcripts and number of GO terms used in each data set.
ontology Num_Genes Num_GO_terms
BP 15162 1154
MF 17429 576
CC 12940 291

Running the tests

There are more than one way to test for enrichment. Something that took me a while to understand is that not only there are different test statistics (Fisher’s exact test, Kolmogorov-Smirnov, and others) but also different algorithms: classic, elim, weight… The algorithms are ways to deal with the dependence structure among GO terms due to topology. Some algorithms are compatible with all statistics implemented in topGO. But the weight and the parentchild algorithms are only applicable to Fisher’s exact test. I am not interested in the classic algorithm, which treats GO nodes as independent, and therefore produces an excess of significant results. I will not use the Fisher’s exact test, because it dependes on an arbitrary threshold of significance on non-adjusted p-values.

BP.elim     <- runTest(GOdata.BP, algorithm = "elim",     statistic = "ks")
BP.weight01 <- runTest(GOdata.BP, algorithm = "weight01", statistic = "ks")
BP.lea      <- runTest(GOdata.BP, algorithm = "lea",      statistic = "ks")
MF.elim     <- runTest(GOdata.MF, algorithm = "elim",     statistic = "ks")
MF.weight01 <- runTest(GOdata.MF, algorithm = "weight01", statistic = "ks")
MF.lea      <- runTest(GOdata.MF, algorithm = "lea",      statistic = "ks")
CC.elim     <- runTest(GOdata.CC, algorithm = "elim",     statistic = "ks")
CC.weight01 <- runTest(GOdata.CC, algorithm = "weight01", statistic = "ks")
CC.lea      <- runTest(GOdata.CC, algorithm = "lea",      statistic = "ks")

ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
   algorithm = rep(c("elim", "weight01", "lea"), 3),
   TermsTested = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) length(score(X))),
   Significant = sapply(list(BP.elim, BP.weight01, BP.lea, MF.elim, MF.weight01, MF.lea, CC.elim, CC.weight01, CC.lea), function(X) sum(score(X) < 0.01)))

kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.
ontology algorithm TermsTested Significant
BP elim 1154 23
BP weight01 1154 12
BP lea 1154 45
MF elim 576 13
MF weight01 576 14
MF lea 576 25
CC elim 291 6
CC weight01 291 4
CC lea 291 9
rm(ResultsSummary)

Results

The topGO package only looks for GO terms that are overrepresented among the top of the gene list (genes with lowest score, assumed to be a \(p\) value; see 2020-06-30). Thus, underrepresented GO terms are ignored, even though it could be interesting to know if among differentially expressed genes there are fewer genes annotated with certain terms than expected by chance. The way to identify those terms would be to reverse the scores. In this case, running the analysis with \(1 - p\) values would be to search for terms that are overrepresented among non-differentially expressed genes. Not worth pursuing now.

Note that, despite my initial confusion (see previous commits), it is not true that significant terms with fewer significant genes than expected are significantly underrepresented terms. It is just possible that a term is significant because of the overall distribution of scores of genes annotated with that term, even if none of those genes (or just fewer than expected) are significant. This is because of the use of Kolmogorov-Smirnov test, instead of the Fisher’s exact test. Note that even though no hard threshold is necessary in the K-S test, GenTable() produces a summary table with number of annotated and significant genes for every significant GO term. That number of significant term is determined by the selection() function passed to the topGO object, only for display purposes unless using Fisher’s exact test.

Biological process

orderedTerms <- names(sort(score(BP.weight01)))
significant.weight01 <- score(BP.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(BP.lea)[orderedTerms] <= 0.01
significant.elim     <- score(BP.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.pvalue.sigTerms <- sigTerms

BP.all <- GenTable(GOdata.BP, elim=BP.elim, weight01=BP.weight01, lea=BP.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
write.table(BP.all, file=paste(TAG, VAR, 'BPsummary.tsv', sep='/'),
            quote=FALSE, sep='\t', row.names=FALSE)

kable(BP.all,
   caption = "Most over-represented terms of the Biological Process ontology.")
Most over-represented terms of the Biological Process ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
GO:0060271 cilium assembly 46 0 0.34 1 8e-07 6.9e-06 8e-07
GO:0046907 intracellular transport 202 0 1.51 439 0.3577 0.0014 0.35774
GO:0006486 protein glycosylation 75 1 0.56 4 0.0018 0.0015 0.00177
GO:1901264 carbohydrate derivative transport 11 0 0.08 212 0.1233 0.0029 0.12333
GO:0015931 nucleobase-containing compound transport 29 0 0.22 498 0.4205 0.0029 0.42046
GO:0034314 Arp2/3 complex-mediated actin nucleation 12 0 0.09 8 0.0031 0.0031 0.00315
GO:0007264 small GTPase mediated signal transductio… 81 0 0.60 10 0.0039 0.0032 0.00393
GO:0072330 monocarboxylic acid biosynthetic process 24 1 0.18 595 0.5189 0.0044 0.51887
GO:0043604 amide biosynthetic process 215 1 1.60 696 0.6011 0.0044 0.47885
GO:0006418 tRNA aminoacylation for protein translat… 40 0 0.30 13 0.0044 0.0044 0.00445
GO:0009968 negative regulation of signal transducti… 9 0 0.07 9 0.0036 0.0064 0.00365
GO:0016579 protein deubiquitination 40 0 0.30 17 0.0064 0.0064 0.00642
GO:0003341 cilium movement 10 0 0.07 30 0.0110 0.0110 0.01100
GO:0001522 pseudouridine synthesis 13 0 0.10 34 0.0120 0.0120 0.01201
GO:0035435 phosphate ion transmembrane transport 8 1 0.06 35 0.0123 0.0123 0.01230
GO:0044341 sodium-dependent phosphate transport 8 1 0.06 36 0.0123 0.0123 0.01230
GO:0006511 ubiquitin-dependent protein catabolic pr… 102 1 0.76 714 0.6260 0.0135 0.62602
GO:0006012 galactose metabolic process 6 0 0.04 39 0.0143 0.0143 0.01430
GO:0098813 nuclear chromosome segregation 27 1 0.20 436 0.3523 0.0175 0.35227
GO:0035556 intracellular signal transduction 173 0 1.29 23 0.0096 0.0182 0.00022
GO:1901135 carbohydrate derivative metabolic proces… 300 2 2.24 370 0.2625 0.0190 0.45764
GO:0019637 organophosphate metabolic process 204 0 1.52 511 0.4352 0.0191 0.68997
GO:0006030 chitin metabolic process 74 0 0.55 224 0.1318 0.0191 0.13182
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term = Term(GOTERM[sigTerms]),
                 Definition = Definition(GOTERM[sigTerms]),
                 PValue=score(BP.weight01)[sigTerms]),
      caption = paste('Biological process terms significantly associated with',
                      VAR, 'according to all 3 algorithms', sep=' '))
Biological process terms significantly associated with hatching according to all 3 algorithms
Term Definition PValue
GO:0060271 cilium assembly The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. 0.0000069
GO:0006486 protein glycosylation A protein modification process that results in the addition of a carbohydrate or carbohydrate derivative unit to a protein amino acid, e.g. the addition of glycan chains to proteins. 0.0015156
GO:0034314 Arp2/3 complex-mediated actin nucleation The actin nucleation process in which actin monomers combine to form a new branch on the side of an existing actin filament; mediated by the Arp2/3 protein complex and its interaction with other proteins. 0.0031481
GO:0007264 small GTPase mediated signal transduction Any series of molecular signals in which a small monomeric GTPase relays one or more of the signals. 0.0032176
GO:0006418 tRNA aminoacylation for protein translation The synthesis of aminoacyl tRNA by the formation of an ester bond between the 3’-hydroxyl group of the most 3’ adenosine of the tRNA and the alpha carboxylic acid group of an amino acid, to be used in ribosome-mediated polypeptide synthesis. 0.0044477
GO:0009968 negative regulation of signal transduction Any process that stops, prevents, or reduces the frequency, rate or extent of signal transduction. 0.0063763
GO:0016579 protein deubiquitination The removal of one or more ubiquitin groups from a protein. 0.0064217

I think the GO graph is useful to see the relationship among the significant terms. But too large graphs are impossible to read. I don’t know how to split the graph in meaningful subgraphs.

showSigOfNodes(GOdata.BP, score(BP.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 186 
## Number of Edges = 347 
## 
## $complete.dag
## [1] "A graph with 186 nodes."

This is just a example of the most significant GO term:

showGroupDensity(GOdata.BP, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdata.BP, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdata.BP, orderedTerms[3], rm.one=FALSE)

Molecular function

orderedTerms <- names(sort(score(MF.weight01)))
significant.weight01 <- score(MF.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(MF.lea)[orderedTerms] <= 0.01
significant.elim     <- score(MF.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.pvalue.sigTerms <- sigTerms

MF.all <- GenTable(GOdata.MF, elim=MF.elim, weight01=MF.weight01, lea=MF.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
write.table(MF.all, file=paste(TAG, VAR, 'MFsummary.tsv', sep='/'),
            quote=FALSE, sep='\t', row.names=FALSE)

kable(MF.all,
   caption = "Most over-represented terms of the Molecular Function ontology.")
Most over-represented terms of the Molecular Function ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
GO:0005515 protein binding 2101 16 14.95 1 5e-08 1.6e-10 1.6e-09
GO:0008417 fucosyltransferase activity 33 0 0.23 3 0.00076 0.00076 0.00076
GO:0005509 calcium ion binding 361 2 2.57 4 0.00088 0.00088 0.00088
GO:0036459 thiol-dependent ubiquitinyl hydrolase ac… 43 0 0.31 27 0.01870 0.00230 0.01870
GO:0004812 aminoacyl-tRNA ligase activity 42 0 0.30 5 0.00240 0.00240 0.00240
GO:0016757 transferase activity, transferring glyco… 194 2 1.38 2 0.00039 0.00259 1.2e-06
GO:1901505 carbohydrate derivative transmembrane tr… 22 0 0.16 288 0.45210 0.00293 0.45210
GO:0008536 Ran GTPase binding 14 0 0.10 6 0.00318 0.00318 0.00318
GO:0004707 MAP kinase activity 6 0 0.04 7 0.00634 0.00634 0.00634
GO:0004842 ubiquitin-protein transferase activity 76 1 0.54 150 0.18724 0.00678 0.18724
GO:0005085 guanyl-nucleotide exchange factor activi… 58 1 0.41 60 0.04614 0.00722 0.04614
GO:0008061 chitin binding 77 0 0.55 9 0.00757 0.00757 0.00757
GO:0005524 ATP binding 786 4 5.59 10 0.00877 0.00877 0.00877
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(MF.weight01)[sigTerms]),
      caption = paste('Molecular function terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Molecular function terms significantly associated with hatching according to all 3 algorithms
Term Definition PValue
GO:0005515 protein binding Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). 0.0000000
GO:0008417 fucosyltransferase activity Catalysis of the transfer of a fucosyl group to an acceptor molecule, typically another carbohydrate or a lipid. 0.0007586
GO:0005509 calcium ion binding Interacting selectively and non-covalently with calcium ions (Ca2+). 0.0008786
GO:0004812 aminoacyl-tRNA ligase activity Catalysis of the formation of aminoacyl-tRNA from ATP, amino acid, and tRNA with the release of diphosphate and AMP. 0.0024049
GO:0016757 transferase activity, transferring glycosyl groups Catalysis of the transfer of a glycosyl group from one compound (donor) to another (acceptor). 0.0025943
GO:0008536 Ran GTPase binding Interacting selectively and non-covalently with Ran, a conserved Ras-like GTP-binding protein, implicated in nucleocytoplasmic transport, cell cycle progression, spindle assembly, nuclear organization and nuclear envelope (NE) assembly. 0.0031825
GO:0004707 MAP kinase activity Catalysis of the reaction: protein + ATP = protein phosphate + ADP. This reaction is the phosphorylation of proteins. Mitogen-activated protein kinase; a family of protein kinases that perform a crucial step in relaying signals from the plasma membrane to the nucleus. They are activated by a wide range of proliferation- or differentiation-inducing signals; activation is strong with agonists such as polypeptide growth factors and tumor-promoting phorbol esters, but weak (in most cell backgrounds) by stress stimuli. 0.0063440
GO:0008061 chitin binding Interacting selectively and non-covalently with chitin, a linear polysaccharide consisting of beta-(1->4)-linked N-acetyl-D-glucosamine residues. 0.0075727
GO:0005524 ATP binding Interacting selectively and non-covalently with ATP, adenosine 5’-triphosphate, a universally important coenzyme and enzyme regulator. 0.0087728
GO:0008375 acetylglucosaminyltransferase activity Catalysis of the transfer of an N-acetylglucosaminyl residue from UDP-N-acetyl-glucosamine to a sugar. 0.0092534
showSigOfNodes(GOdata.MF, score(MF.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 61 
## Number of Edges = 78 
## 
## $complete.dag
## [1] "A graph with 61 nodes."
showGroupDensity(GOdata.MF, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdata.MF, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdata.MF, orderedTerms[3], rm.one=FALSE)

Cellular component

orderedTerms <- names(sort(score(CC.weight01)))
significant.weight01 <- score(CC.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(CC.lea)[orderedTerms] <= 0.01
significant.elim     <- score(CC.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.pvalue.sigTerms <- sigTerms

CC.all <- GenTable(GOdata.CC, elim=CC.elim, weight01=CC.weight01, lea=CC.lea,
                   orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))

write.table(CC.all, file=paste(TAG, VAR, 'CCsummary.tsv', sep='/'),
            quote=FALSE, sep='\t', row.names=FALSE)

kable(CC.all,
   caption = "Most over-represented terms of the Cellular Component ontology.")
Most over-represented terms of the Cellular Component ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
GO:0030286 dynein complex 26 0 0.18 1 0.00024 0.0022 0.00024
GO:0032991 protein-containing complex 886 4 6.23 31 0.04084 0.0037 0.03349
GO:0005815 microtubule organizing center 37 1 0.26 11 0.01340 0.0046 0.01340
GO:0005885 Arp2/3 protein complex 8 0 0.06 5 0.00799 0.0080 0.00799
GO:0005680 anaphase-promoting complex 7 0 0.05 7 0.01124 0.0112 0.01124
GO:0044428 nuclear part 236 0 1.66 39 0.05022 0.0122 0.07962
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(CC.weight01)[sigTerms]),
      caption = paste('Cellular component terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Cellular component terms significantly associated with hatching according to all 3 algorithms
Term Definition PValue
GO:0030286 dynein complex Any of several large complexes that contain two or three dynein heavy chains and several light chains, and have microtubule motor activity. 0.0021741
GO:0005885 Arp2/3 protein complex A stable protein complex that contains two actin-related proteins, Arp2 and Arp3, and five novel proteins (ARPC1-5), and functions in the nucleation of branched actin filaments. 0.0079859
showSigOfNodes(GOdata.CC, score(CC.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 30 
## Number of Edges = 48 
## 
## $complete.dag
## [1] "A graph with 30 nodes."
showGroupDensity(GOdata.CC, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdata.CC, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdata.CC, orderedTerms[3], rm.one=FALSE)

Using the portion of variance explained by hatching

Building the topGO object

I need to generate the topGO objects again, using the alternative gene ordering, based on the proportion of expression-level variance explained by hatching. I miss a way to inform the topGOdata object that the score in this case is reversed, with respect to \(p\)-values: the higher it is, the more differentially expressed the gene is. To make sure that GO terms are tested in the same way than when using p-values, I will just reverse the proportion of variance explained by hatching to its complement. Taking this into account, the subset of interesting genes (selection() function) must be defined as the lowest 10% scores, which are the 10% genes with largest portion of variance explained by hatching.

selection <- function(allScores) {return(allScores <= quantile(allScores, probs = 0.10))}
GOdataVar.BP <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'BP', 
   allGenes = 1.0 - Variance,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdataVar.MF <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'MF',
   allGenes = 1.0 - Variance,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

GOdataVar.CC <- new('topGOdata',
   description = 'Differential expression among Brachionus plicatilis populations.',
   ontology = 'CC',
   allGenes = 1.0 - Variance,
   annot = annFUN.gene2GO,
   gene2GO = allgenes2GO,
   geneSelectionFun = selection,
   nodeSize = 5)

library(knitr)
DataSummary <- data.frame(ontology = c('BP', 'MF', 'CC'),
   Num_Genes = sapply(list(GOdataVar.BP, GOdataVar.MF, GOdataVar.CC), numGenes),
   Num_GO_terms = sapply(list(GOdataVar.BP, GOdataVar.MF, GOdataVar.CC), function(x) length(usedGO(x))))
kable(DataSummary, caption='Number of feasible genes or transcripts and number of GO terms used in each data set.')
rm(DataSummary)
Number of feasible genes or transcripts and number of GO terms used in each data set.
ontology Num_Genes Num_GO_terms
BP 15162 1154
MF 17429 576
CC 12940 291

Running the tests

BPvar.elim     <- runTest(GOdataVar.BP, algorithm = "elim",     statistic = "ks")
BPvar.weight01 <- runTest(GOdataVar.BP, algorithm = "weight01", statistic = "ks")
BPvar.lea      <- runTest(GOdataVar.BP, algorithm = "lea",      statistic = "ks")
MFvar.elim     <- runTest(GOdataVar.MF, algorithm = "elim",     statistic = "ks")
MFvar.weight01 <- runTest(GOdataVar.MF, algorithm = "weight01", statistic = "ks")
MFvar.lea      <- runTest(GOdataVar.MF, algorithm = "lea",      statistic = "ks")
CCvar.elim     <- runTest(GOdataVar.CC, algorithm = "elim",     statistic = "ks")
CCvar.weight01 <- runTest(GOdataVar.CC, algorithm = "weight01", statistic = "ks")
CCvar.lea      <- runTest(GOdataVar.CC, algorithm = "lea",      statistic = "ks")

ResultsSummary <- data.frame(ontology = rep(c("BP", "MF", "CC"), each = 3),
   algorithm = rep(c("elim", "weight01", "lea"), 3),
   TermsTested = sapply(list(BPvar.elim, BPvar.weight01, BPvar.lea, MFvar.elim, MFvar.weight01, MFvar.lea, CCvar.elim, CCvar.weight01, CCvar.lea), function(X) length(score(X))),
   Significant = sapply(list(BPvar.elim, BPvar.weight01, BPvar.lea, MFvar.elim, MFvar.weight01, MFvar.lea, CCvar.elim, CCvar.weight01, CCvar.lea), function(X) sum(score(X) < 0.01)))

kable(ResultsSummary, caption="Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.")
Number of non-trivial terms tested and those with a score (not corrected p-value) lower than 0.01.
ontology algorithm TermsTested Significant
BP elim 1154 39
BP weight01 1154 18
BP lea 1154 74
MF elim 576 22
MF weight01 576 17
MF lea 576 29
CC elim 291 11
CC weight01 291 9
CC lea 291 22
rm(ResultsSummary)

Results

Biological process

orderedTerms <- names(sort(score(BPvar.weight01)))
significant.weight01 <- score(BPvar.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(BPvar.lea)[orderedTerms] <= 0.01
significant.elim     <- score(BPvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
BP.variance.sigTerms <- sigTerms

BPvar.all <- GenTable(GOdataVar.BP, elim=BPvar.elim, weight01=BPvar.weight01, lea=BPvar.lea,
                      orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))

kable(BPvar.all,
   caption = "Most over-represented terms of the Biological Process ontology.")
Most over-represented terms of the Biological Process ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
GO:0046907 intracellular transport 202 30 19.97 136 0.03824 0.00018 0.00241
GO:0060271 cilium assembly 46 2 4.55 1 7e-05 0.00063 7e-05
GO:0007264 small GTPase mediated signal transductio… 81 12 8.01 3 0.00034 0.00075 0.00034
GO:0006486 protein glycosylation 75 7 7.41 19 0.00314 0.00092 0.00314
GO:0006511 ubiquitin-dependent protein catabolic pr… 102 7 10.08 456 0.29323 0.00113 0.29323
GO:0006030 chitin metabolic process 74 4 7.32 55 0.01247 0.00163 0.01247
GO:0007020 microtubule nucleation 11 2 1.09 10 0.00224 0.00224 0.00224
GO:0015931 nucleobase-containing compound transport 29 3 2.87 211 0.08131 0.00225 0.08131
GO:1901264 carbohydrate derivative transport 11 2 1.09 150 0.04630 0.00225 0.04630
GO:0006413 translational initiation 18 0 1.78 12 0.00268 0.00268 0.00268
GO:0034314 Arp2/3 complex-mediated actin nucleation 12 4 1.19 17 0.00291 0.00291 0.00291
GO:0006417 regulation of translation 16 3 1.58 47 0.01119 0.00423 0.01119
GO:0040029 regulation of gene expression, epigeneti… 6 0 0.59 23 0.00481 0.00481 0.00481
GO:0048193 Golgi vesicle transport 43 6 4.25 428 0.25589 0.00506 0.36524
GO:0016310 phosphorylation 319 21 31.54 557 0.38238 0.00570 0.63116
GO:0032006 regulation of TOR signaling 7 3 0.69 28 0.00672 0.00672 0.00672
GO:0006302 double-strand break repair 21 4 2.08 8 0.00189 0.00724 0.00189
GO:1902533 positive regulation of intracellular sig… 6 3 0.59 35 0.00967 0.00967 0.00967
GO:0016579 protein deubiquitination 40 3 3.95 44 0.01061 0.01061 0.01061
GO:0006012 galactose metabolic process 6 2 0.59 46 0.01093 0.01093 0.01093
GO:0006418 tRNA aminoacylation for protein translat… 40 1 3.95 57 0.01302 0.01302 0.01302
GO:0046854 phosphatidylinositol phosphorylation 11 2 1.09 62 0.01364 0.01364 0.01364
GO:0006672 ceramide metabolic process 8 3 0.79 336 0.16508 0.01517 0.16508
GO:0044262 cellular carbohydrate metabolic process 28 1 2.77 875 0.70059 0.01608 0.70059
GO:0006357 regulation of transcription by RNA polym… 58 7 5.73 119 0.03000 0.01610 0.03000
GO:0006606 protein import into nucleus 9 2 0.89 84 0.01887 0.01887 0.01887
GO:0006325 chromatin organization 60 13 5.93 49 0.01146 0.01903 0.00282
GO:0007275 multicellular organism development 41 5 4.05 335 0.16442 0.01931 0.16442
GO:0051650 establishment of vesicle localization 6 2 0.59 90 0.01972 0.01972 0.01972
GO:0006506 GPI anchor biosynthetic process 24 5 2.37 92 0.02027 0.02027 0.02027
GO:0009967 positive regulation of signal transducti… 9 5 0.89 99 0.02283 0.02283 0.00026
GO:0016573 histone acetylation 12 5 1.19 101 0.02325 0.02325 0.02325
GO:0033365 protein localization to organelle 38 3 3.76 30 0.00726 0.02412 0.00726
GO:0031323 regulation of cellular metabolic process 423 53 41.82 163 0.05053 0.02474 0.00081
GO:0009968 negative regulation of signal transducti… 9 3 0.89 18 0.00300 0.02493 0.00300
GO:0050953 sensory perception of light stimulus 8 2 0.79 666 0.49177 0.02495 0.49177
GO:0007156 homophilic cell adhesion via plasma memb… 21 3 2.08 106 0.02580 0.02580 0.02580
GO:0009894 regulation of catabolic process 5 2 0.49 112 0.02793 0.02793 0.02793
GO:0006457 protein folding 49 3 4.84 113 0.02800 0.02800 0.02800
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(BPvar.weight01)[sigTerms]),
      caption = paste('Biological process terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Biological process terms significantly associated with hatching according to all 3 algorithms
Term Definition PValue
GO:0060271 cilium assembly The assembly of a cilium, a specialized eukaryotic organelle that consists of a filiform extrusion of the cell surface. Each cilium is bounded by an extrusion of the cytoplasmic membrane, and contains a regular longitudinal array of microtubules, anchored basally in a centriole. 0.0006298
GO:0007264 small GTPase mediated signal transduction Any series of molecular signals in which a small monomeric GTPase relays one or more of the signals. 0.0007542
GO:0006486 protein glycosylation A protein modification process that results in the addition of a carbohydrate or carbohydrate derivative unit to a protein amino acid, e.g. the addition of glycan chains to proteins. 0.0009226
GO:0007020 microtubule nucleation The process in which tubulin alpha-beta heterodimers begin aggregation to form an oligomeric tubulin structure (a microtubule seed). Microtubule nucleation is the initiating step in the formation of a microtubule in the absence of any existing microtubules (‘de novo’ microtubule formation). 0.0022416
GO:0006413 translational initiation The process preceding formation of the peptide bond between the first two amino acids of a protein. This includes the formation of a complex of the ribosome, mRNA or circRNA, and an initiation complex that contains the first aminoacyl-tRNA. 0.0026807
GO:0034314 Arp2/3 complex-mediated actin nucleation The actin nucleation process in which actin monomers combine to form a new branch on the side of an existing actin filament; mediated by the Arp2/3 protein complex and its interaction with other proteins. 0.0029146
GO:0040029 regulation of gene expression, epigenetic Any process that modulates the frequency, rate or extent of gene expression; the process is mitotically or meiotically heritable, or is stably self-propagated in the cytoplasm of a resting cell, and does not entail a change in DNA sequence. 0.0048076
GO:0032006 regulation of TOR signaling Any process that modulates the frequency, rate or extent of TOR signaling. 0.0067152
GO:0006302 double-strand break repair The repair of double-strand breaks in DNA via homologous and nonhomologous mechanisms to reform a continuous DNA helix. 0.0072352
GO:1902533 positive regulation of intracellular signal transduction NA 0.0096696
showSigOfNodes(GOdataVar.BP, score(BPvar.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 290 
## Number of Edges = 563 
## 
## $complete.dag
## [1] "A graph with 290 nodes."

Below I plot variance portion, but for the termo found most significant when using p-values, for comparison.

showGroupDensity(GOdataVar.BP, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdataVar.BP, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdataVar.BP, orderedTerms[3], rm.one=FALSE)

Molecular function

orderedTerms <- names(sort(score(MFvar.weight01)))
significant.weight01 <- score(MFvar.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(MFvar.lea)[orderedTerms] <= 0.01
significant.elim     <- score(MFvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
MF.variance.sigTerms <- sigTerms

MFvar.all <- GenTable(GOdataVar.MF, elim=MFvar.elim, weight01=MFvar.weight01, lea=MFvar.lea,
                      orderBy="weight01", ranksOf="elim", topNodes=sum(significant.elim))
kable(MFvar.all,
   caption = "Most over-represented terms of the Molecular Function ontology.")
Most over-represented terms of the Molecular Function ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
GO:0005515 protein binding 2101 235 207.34 1 3.9e-07 1.9e-08 4.4e-09
GO:0008536 Ran GTPase binding 14 3 1.38 2 4.7e-05 4.7e-05 4.7e-05
GO:0043015 gamma-tubulin binding 9 3 0.89 3 0.00012 0.00012 0.00012
GO:0003743 translation initiation factor activity 28 1 2.76 4 0.00015 0.00015 0.00015
GO:0008061 chitin binding 77 7 7.60 5 0.00026 0.00026 0.00026
GO:0042578 phosphoric ester hydrolase activity 140 9 13.82 495 0.86923 0.00082 0.86923
GO:0008417 fucosyltransferase activity 33 4 3.26 8 0.00114 0.00114 0.00114
GO:1901505 carbohydrate derivative transmembrane tr… 22 2 2.17 264 0.37713 0.00221 0.37713
GO:0003714 transcription corepressor activity 10 5 0.99 12 0.00369 0.00369 0.00369
GO:0036459 thiol-dependent ubiquitinyl hydrolase ac… 43 3 4.24 51 0.02297 0.00395 0.02297
GO:0016757 transferase activity, transferring glyco… 194 21 19.15 66 0.03258 0.00403 1.1e-05
GO:0060090 molecular adaptor activity 10 1 0.99 122 0.12197 0.00527 0.12197
GO:0005524 ATP binding 786 66 77.57 15 0.00571 0.00571 0.00571
GO:0004386 helicase activity 46 8 4.54 44 0.01886 0.00713 0.01886
GO:0005096 GTPase activator activity 34 4 3.36 16 0.00735 0.00735 0.00735
GO:0004312 fatty acid synthase activity 5 3 0.49 19 0.00912 0.00912 0.00912
GO:0005102 signaling receptor binding 30 4 2.96 112 0.09745 0.00973 0.09745
GO:0004842 ubiquitin-protein transferase activity 76 13 7.50 65 0.03179 0.01003 0.03179
GO:0015932 nucleobase-containing compound transmemb… 23 2 2.27 288 0.44074 0.01062 0.44074
GO:0008194 UDP-glycosyltransferase activity 35 5 3.45 10 0.00203 0.01153 0.00203
GO:0003993 acid phosphatase activity 8 1 0.79 26 0.01198 0.01198 0.01198
GO:0004707 MAP kinase activity 6 0 0.59 27 0.01238 0.01238 0.01238
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(MFvar.weight01)[sigTerms]),
      caption = paste('Molecular functions terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Molecular functions terms significantly associated with hatching according to all 3 algorithms
Term Definition PValue
GO:0005515 protein binding Interacting selectively and non-covalently with any protein or protein complex (a complex of two or more proteins that may include other nonprotein molecules). 0.0000000
GO:0008536 Ran GTPase binding Interacting selectively and non-covalently with Ran, a conserved Ras-like GTP-binding protein, implicated in nucleocytoplasmic transport, cell cycle progression, spindle assembly, nuclear organization and nuclear envelope (NE) assembly. 0.0000471
GO:0043015 gamma-tubulin binding Interacting selectively and non-covalently with the microtubule constituent protein gamma-tubulin. 0.0001188
GO:0003743 translation initiation factor activity Functions in the initiation of ribosome-mediated translation of mRNA into a polypeptide. 0.0001548
GO:0008061 chitin binding Interacting selectively and non-covalently with chitin, a linear polysaccharide consisting of beta-(1->4)-linked N-acetyl-D-glucosamine residues. 0.0002588
GO:0008417 fucosyltransferase activity Catalysis of the transfer of a fucosyl group to an acceptor molecule, typically another carbohydrate or a lipid. 0.0011400
GO:0003714 transcription corepressor activity A protein or a member of a complex that interacts specifically and non-covalently with a DNA-bound DNA-binding transcription factor to repress the transcription of specific genes. Corepressors often act by altering chromatin structure and modifications. For example, one class of transcription coregulators modifies chromatin structure through covalent modification of histones. A second ATP-dependent class modifies the conformation of chromatin. A third class occludes DNA-binding transcription factor protein-protein interaction domains. A third class of corepressors prevents interactions of DNA bound DNA-binding transcription factor with coactivators. 0.0036872
GO:0005524 ATP binding Interacting selectively and non-covalently with ATP, adenosine 5’-triphosphate, a universally important coenzyme and enzyme regulator. 0.0057069
GO:0005096 GTPase activator activity Binds to and increases the activity of a GTPase, an enzyme that catalyzes the hydrolysis of GTP. 0.0073531
GO:0004312 fatty acid synthase activity Catalysis of the reaction: acetyl-CoA + n malonyl-CoA + 2n NADPH + 2n H+ = long-chain fatty acid + n+1 CoA + n CO2 + 2n NADP+. 0.0091154
showSigOfNodes(GOdataVar.MF, score(MFvar.weight01),
               firstSigNodes = sum(significant.elim),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 81 
## Number of Edges = 97 
## 
## $complete.dag
## [1] "A graph with 81 nodes."

I plot variance portion, but for the term found most significant when using p-values, for comparison.

showGroupDensity(GOdataVar.MF, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdataVar.MF, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdataVar.MF, orderedTerms[3], rm.one=FALSE)

Cellular component

orderedTerms <- names(sort(score(CCvar.weight01)))
significant.weight01 <- score(CCvar.weight01)[orderedTerms] <= 0.01
significant.lea      <- score(CCvar.lea)[orderedTerms] <= 0.01
significant.elim     <- score(CCvar.elim)[orderedTerms] <= 0.01
sigTerms <- orderedTerms[significant.weight01 & significant.lea & significant.elim]
if (length(sigTerms) == 0) {
   sigTerms <- orderedTerms[significant.lea & significant.elim]
}
# sigTerms gets updated, and is already used in the code, but I need a permanent copy for later.
CC.variance.sigTerms <- sigTerms

CCvar.all <- GenTable(GOdataVar.CC, elim=CCvar.elim, weight01=CCvar.weight01, lea=CCvar.lea,
                      orderBy="weight01", ranksOf="elim", topNodes=max(sum(significant.elim), 10))

kable(CCvar.all,
   caption = "Most over-represented terms of the Cellular Component ontology.")
Most over-represented terms of the Cellular Component ontology.
GO.ID Term Annotated Significant Expected Rank in elim elim weight01 lea
GO:0005852 eukaryotic translation initiation factor… 13 0 1.27 1 0.00031 0.00031 0.00031
GO:0032991 protein-containing complex 886 84 86.41 12 0.01080 0.00229 8.5e-05
GO:0000790 nuclear chromatin 24 3 2.34 41 0.03543 0.00231 0.03543
GO:0044428 nuclear part 236 34 23.02 7 0.00424 0.00354 3.4e-05
GO:0016272 prefoldin complex 7 0 0.68 8 0.00484 0.00484 0.00484
GO:0005680 anaphase-promoting complex 7 4 0.68 9 0.00648 0.00648 0.00648
GO:0005783 endoplasmic reticulum 68 4 6.63 128 0.29861 0.00773 0.29861
GO:0030015 CCR4-NOT core complex 8 4 0.78 11 0.00806 0.00806 0.00806
GO:1902562 H4 histone acetyltransferase complex 10 2 0.98 35 0.02728 0.00821 0.02728
GO:0005737 cytoplasm 634 53 61.83 42 0.03675 0.01317 0.02464
GO:0044451 nucleoplasm part 96 14 9.36 6 0.00349 0.01954 0.00349
vennDiagram(vennCounts(cbind(weight01=significant.weight01,
                             lea=significant.lea,
                             elim=significant.elim)))

kable(data.frame(Term=Term(GOTERM[sigTerms]),
                 Definition=Definition(GOTERM[sigTerms]),
                 PValue=score(CCvar.weight01)[sigTerms]),
      caption = paste('Cellular component terms significantly associated with', VAR,
                      'according to all 3 algorithms', sep=' '))
Cellular component terms significantly associated with hatching according to all 3 algorithms
Term Definition PValue
GO:0005852 eukaryotic translation initiation factor 3 complex A complex of several polypeptides that plays at least two important roles in protein synthesis: First, eIF3 binds to the 40S ribosome and facilitates loading of the Met-tRNA/eIF2.GTP ternary complex to form the 43S preinitiation complex. Subsequently, eIF3 apparently assists eIF4 in recruiting mRNAs to the 43S complex. The eIF3 complex contains five conserved core subunits, and may contain several additional proteins; the non-core subunits are thought to mediate association of the complex with specific sets of mRNAs. 0.0003090
GO:0044428 nuclear part Any constituent part of the nucleus, a membrane-bounded organelle of eukaryotic cells in which chromosomes are housed and replicated. 0.0035361
GO:0016272 prefoldin complex A multisubunit chaperone that is capable of delivering unfolded proteins to cytosolic chaperonin, which it acts as a cofactor for. In humans, the complex is a heterohexamer of two PFD-alpha and four PFD-beta type subunits. In Saccharomyces cerevisiae, it also acts in the nucleus to regulate the rate of elongation by RNA polymerase II via a direct effect on histone dynamics. 0.0048382
GO:0005680 anaphase-promoting complex A ubiquitin ligase complex that degrades mitotic cyclins and anaphase inhibitory protein, thereby triggering sister chromatid separation and exit from mitosis. Substrate recognition by APC occurs through degradation signals, the most common of which is termed the Dbox degradation motif, originally discovered in cyclin B. 0.0064777
GO:0030015 CCR4-NOT core complex The core of the CCR4-NOT complex. In Saccharomyces the CCR4-NOT core complex comprises Ccr4p, Caf1p, Caf40p, Caf130p, Not1p, Not2p, Not3p, Not4p, and Not5p. 0.0080568
showSigOfNodes(GOdataVar.CC, score(CCvar.weight01),
               firstSigNodes = sum(significant.weight01),
               wantedNodes = sigTerms)

## $dag
## A graphNEL graph with directed edges
## Number of Nodes = 46 
## Number of Edges = 78 
## 
## $complete.dag
## [1] "A graph with 46 nodes."

For comparison, I plot the distribution of variance portion for the CC term found most significant when using p-values.

showGroupDensity(GOdataVar.CC, orderedTerms[1], rm.one=FALSE)

showGroupDensity(GOdataVar.CC, orderedTerms[2], rm.one=FALSE)

showGroupDensity(GOdataVar.CC, orderedTerms[3], rm.one=FALSE)

Comparison between the two ordering of genes.

Biological process

allTerms <- usedGO(GOdata.BP)
vennDiagram(vennCounts(cbind(PValue   = allTerms %in% BP.pvalue.sigTerms,
                             Variance = allTerms %in% BP.variance.sigTerms)))

ggplot(data = data.frame(PValue   = rank(score(BP.weight01))[allTerms],
                         Variance = rank(score(BPvar.weight01))[allTerms]),
       mapping = aes(x = PValue, y = Variance)) +
   geom_point() + geom_smooth(method='lm') + xlab('Using gene p-values') +
   ylab('Using portion of explained variance') +
   ggtitle('Ordering of BP terms by significance')

Molecular function

allTerms <- usedGO(GOdata.MF)
vennDiagram(vennCounts(cbind(PValue   = allTerms %in% MF.pvalue.sigTerms,
                             Variance = allTerms %in% MF.variance.sigTerms)))

ggplot(data = data.frame(PValue   = rank(score(MF.weight01))[allTerms],
                         Variance = rank(score(MFvar.weight01))[allTerms]),
       mapping = aes(x = PValue, y = Variance)) +
   geom_point() + geom_smooth(methdo='lm') + xlab('Using gene p-values') +
   ylab('Using portion of explained variance') +
   ggtitle('Ordering of MF terms by significance')
## Warning: Ignoring unknown parameters: methdo

Cellular component

allTerms <- usedGO(GOdata.CC)
vennDiagram(vennCounts(cbind(PValue   = allTerms %in% CC.pvalue.sigTerms,
                             Variance = allTerms %in% CC.variance.sigTerms)))

ggplot(data = data.frame(PValue   = rank(score(CC.weight01))[allTerms],
                         Variance = rank(score(CCvar.weight01))[allTerms]),
       mapping = aes(x = PValue, y = Variance)) +
   geom_point() + geom_smooth(method='lm') + xlab('Using gene p-values') +
   ylab('Using portion of explained variance') +
   ggtitle('Ordering of CC terms by significance')

Session info

I save the main variables to be able to load them in a new R session later.

save(allgenes2GO,
     GOdata.BP, BP.elim, BP.weight01, BP.lea, BP.pvalue.sigTerms,
     GOdata.MF, MF.elim, MF.weight01, MF.lea, MF.pvalue.sigTerms,
     GOdata.CC, CC.elim, CC.weight01, CC.lea, CC.pvalue.sigTerms,
     GOdataVar.BP, BPvar.elim, BPvar.weight01, BPvar.lea, BP.variance.sigTerms,
     GOdataVar.MF, MFvar.elim, MFvar.weight01, MFvar.lea, MF.variance.sigTerms,
     GOdataVar.CC, CCvar.elim, CCvar.weight01, CCvar.lea, CC.variance.sigTerms,
     file = paste('Enrichment', TAG, VAR, 'RData', sep='.'))
sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 18.04.5 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
## 
## locale:
##  [1] LC_CTYPE=ca_ES.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=ca_ES.UTF-8        LC_COLLATE=ca_ES.UTF-8    
##  [5] LC_MONETARY=ca_ES.UTF-8    LC_MESSAGES=ca_ES.UTF-8   
##  [7] LC_PAPER=ca_ES.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=ca_ES.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
##  [1] grid      stats4    parallel  stats     graphics  grDevices utils    
##  [8] datasets  methods   base     
## 
## other attached packages:
##  [1] Rgraphviz_2.30.0     ggplot2_3.3.0        limma_3.42.2        
##  [4] knitr_1.28           topGO_2.38.1         SparseM_1.78        
##  [7] GO.db_3.10.0         AnnotationDbi_1.48.0 IRanges_2.20.2      
## [10] S4Vectors_0.24.3     Biobase_2.46.0       graph_1.64.0        
## [13] BiocGenerics_0.32.0 
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_1.0.0   xfun_0.12          purrr_0.3.3        splines_3.6.3     
##  [5] lattice_0.20-41    colorspace_1.4-1   vctrs_0.2.4        htmltools_0.4.0   
##  [9] yaml_2.2.1         mgcv_1.8-33        blob_1.2.1         rlang_0.4.5       
## [13] pillar_1.4.3       glue_1.4.0         withr_2.1.2        DBI_1.1.0         
## [17] bit64_0.9-7        matrixStats_0.55.0 lifecycle_0.2.0    stringr_1.4.0     
## [21] munsell_0.5.0      gtable_0.3.0       memoise_1.1.0      evaluate_0.14     
## [25] labeling_0.3       fansi_0.4.1        highr_0.8          Rcpp_1.0.4        
## [29] scales_1.1.0       farver_2.0.3       bit_1.1-15.2       digest_0.6.25     
## [33] stringi_1.4.6      dplyr_0.8.5        cli_2.0.2          tools_3.6.3       
## [37] magrittr_1.5       RSQLite_2.2.0      tibble_3.0.0       crayon_1.3.4      
## [41] pkgconfig_2.0.3    Matrix_1.2-18      ellipsis_0.3.0     assertthat_0.2.1  
## [45] rmarkdown_2.1      R6_2.4.1           nlme_3.1-149       compiler_3.6.3